59 research outputs found
On the error of estimating the sparsest solution of underdetermined linear systems
Let A be an n by m matrix with m>n, and suppose that the underdetermined
linear system As=x admits a sparse solution s0 for which ||s0||_0 < 1/2
spark(A). Such a sparse solution is unique due to a well-known uniqueness
theorem. Suppose now that we have somehow a solution s_hat as an estimation of
s0, and suppose that s_hat is only `approximately sparse', that is, many of its
components are very small and nearly zero, but not mathematically equal to
zero. Is such a solution necessarily close to the true sparsest solution? More
generally, is it possible to construct an upper bound on the estimation error
||s_hat-s0||_2 without knowing s0? The answer is positive, and in this paper we
construct such a bound based on minimal singular values of submatrices of A. We
will also state a tight bound, which is more complicated, but besides being
tight, enables us to study the case of random dictionaries and obtain
probabilistic upper bounds. We will also study the noisy case, that is, where
x=As+n. Moreover, we will see that where ||s0||_0 grows, to obtain a
predetermined guaranty on the maximum of ||s_hat-s0||_2, s_hat is needed to be
sparse with a better approximation. This can be seen as an explanation to the
fact that the estimation quality of sparse recovery algorithms degrades where
||s0||_0 grows.Comment: To appear in December 2011 issue of IEEE Transactions on Information
Theor
Linking genomics and metabolomics to chart specialized metabolic diversity
Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery
Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery.
Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs
HypoRiPPAtlas as an Atlas of hypothetical natural products for mass spectrometry database search
Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain
unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap
between large-scale genome mining and mass spectral datasets for natural
product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical
natural product structures, which is ready-to-use for in silico database search
of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes
using seq2ripp, a machine-learning tool for the prediction of ribosomally
synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be
extended to other natural product classes in the future by implementing
corresponding biosynthetic logic. This study paves the way for large-scale
explorations of biosynthetic pathways and chemical structures of microbial
and plant RiPP classes
A community resource for paired genomic and metabolomic data mining
Genomics and metabolomics are widely used to explore specialized metabolite diversity. The Paired Omics Data Platform is a community initiative to systematically document links between metabolome and (meta)genome data, aiding identification of natural product biosynthetic origins and metabolite structures.Peer reviewe
American Gut: an Open Platform for Citizen Science Microbiome Research
McDonald D, Hyde E, Debelius JW, et al. American Gut: an Open Platform for Citizen Science Microbiome Research. mSystems. 2018;3(3):e00031-18
microbeMASST: A Taxonomically-informed Mass Spectrometry Search Tool for Microbial Metabolomics Data
microbeMASST, a taxonomically informed mass spectrometry (MS) search tool, tackles limited microbial metabolite annotation in untargeted metabolomics experiments. Leveraging a curated database of >60,000 microbial monocultures, users can search known and unknown MS/MS spectra and link them to their respective microbial producers via MS/MS fragmentation patterns. Identification of microbe-derived metabolites and relative producers without a priori knowledge will vastly enhance the understanding of microorganisms’ role in ecology and human health
Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking
The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data
- …